NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Teaching Models to Balance Resisting and Accepting Persuasion

https://doi.org/10.18653/v1/2025.naacl-long.412

Stengel-Eskin, Elias; Hase, Peter; Bansal, Mohit (January 2025, Association for Computational Linguistics)

Full Text Available
AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge

https://doi.org/10.18653/v1/2025.naacl-long.581

Wang, Han; Prasad, Archiki; Stengel-Eskin, Elias; Bansal, Mohit (January 2025, Association for Computational Linguistics)

Full Text Available
MAMM-Refine: A Recipe for Improving Faithfulness in Generation with Multi-Agent Collaboration

https://doi.org/10.18653/v1/2025.naacl-long.498

Wan, David; Chen, Justin; Stengel-Eskin, Elias; Bansal, Mohit (January 2025, Association for Computational Linguistics)

Full Text Available
GTBench: Uncovering the Strategic Reasoning Capabilities of LLMs via Game-Theoretic Evaluations

Duan, Jinhao; Zhang, Renming; Diffenderfer, James; Kailkhura, Bhavya; Sun, Lichao; Stengel-Eskin, Elias; Bansal, Mohit; Chen, Tianlong; Xu, Kaidi (December 2024, Neural Information Processing Systems Foundation, Inc. (NeurIPS))

As Large Language Models (LLMs) are integrated into critical real-world applications, their strategic and logical reasoning abilities are increasingly crucial. This paper evaluates LLMs' reasoning abilities in competitive environments through game-theoretic tasks, e.g., board and card games that require pure logic and strategic reasoning to compete with opponents. We first propose GTBench, a language-driven environment composing 10 widely-recognized tasks, across a comprehensive game taxonomy: complete versus incomplete information, dynamic versus static, and probabilistic versus deterministic scenarios. Then, we (1) Characterize the game-theoretic reasoning of LLMs; and (2) Perform LLM-vs.-LLM competitions as reasoning evaluation. We observe that (1) LLMs have distinct behaviors regarding various gaming scenarios; for example, LLMs fail in complete and deterministic games yet they are competitive in probabilistic gaming scenarios; (2) Most open-source LLMs, e.g., CodeLlama-34b-Instruct and Llama-2-70b-chat, are less competitive than commercial LLMs, e.g., GPT-4, in complex games, yet the recently released Llama-3-70b-Instruct makes up for this shortcoming. In addition, code-pretraining greatly benefits strategic reasoning, while advanced reasoning methods such as Chain-of-Thought (CoT) and Tree-of-Thought (ToT) do not always help. We further characterize the game-theoretic properties of LLMs, such as equilibrium and Pareto Efficiency in repeated games. Detailed error profiles are provided for a better understanding of LLMs' behavior. We hope our research provides standardized protocols and serves as a foundation to spur further explorations in the strategic reasoning of LLMs.
more » « less
Full Text Available
Guiding Multi-Step Rearrangement Tasks with Natural Language Instructions

Stengel-Eskin, Elias; Hundt, Andrew; He, Zhuohong; Murali, Aditya; Gopalan, Nakul; Gombolay, Matthew; Hager, Gregory (November 2022, Proceedings of Machine Learning Research)

Full Text Available
Guiding Multi-Step Rearrangement Tasks with Natural Language Instructions

Stengel-Eskin, Elias; Hundt, Andrew; He, Zhuohong; Murali, Aditya; Gopalan, Nakul; Gombolay, Matthew; Hager, Gregory (October 2021, Proceedings of Machine Learning Research)
Faust, Aleksandra; Hsu, David; Neumann, Gerhard (Ed.)
Enabling human operators to interact with robotic agents using natural language would allow non-experts to intuitively instruct these agents. Towards this goal, we propose a novel Transformer-based model which enables a user to guide a robot arm through a 3D multi-step manipulation task with natural language commands. Our system maps images and commands to masks over grasp or place locations, grounding the language directly in perceptual space. In a suite of block rearrangement tasks, we show that these masks can be combined with an existing manipulation framework without re-training, greatly improving learning efficiency. Our masking model is several orders of magnitude more sample efficient than typical Transformer models, operating with hundreds, not millions, of examples. Our modular design allows us to leverage supervised and reinforcement learning, providing an easy interface for experimentation with different architectures. Our model completes block manipulation tasks with synthetic commands more often than a UNet-based baseline, and learns to localize actions correctly while creating a mapping of symbols to perceptual input that supports compositional reasoning. We provide a valuable resource for 3D manipulation instruction following research by porting an existing 3D block dataset with crowdsourced language to a simulated environment. Our method’s absolute improvement in identifying the correct block on the ported dataset demonstrates its ability to handle syntactic and lexical variation.
more » « less
Full Text Available
Iterative Paraphrastic Augmentation with Discriminative Span Alignment

https://doi.org/10.1162/tacl_a_00380

Culkin, Ryan; Hu, J. Edward; Stengel-Eskin, Elias; Qin, Guanghui; Durme, Benjamin Van (January 2021, Transactions of the Association for Computational Linguistics)
null (Ed.)
Abstract We introduce a novel paraphrastic augmentation strategy based on sentence-level lexically constrained paraphrasing and discriminative span alignment. Our approach allows for the large-scale expansion of existing datasets or the rapid creation of new datasets using a small, manually produced seed corpus. We demonstrate our approach with experiments on the Berkeley FrameNet Project, a large-scale language understanding effort spanning more than two decades of human labor. With four days of training data collection for a span alignment model and one day of parallel compute, we automatically generate and release to the community 495,300 unique (Frame,Trigger) pairs in diverse sentential contexts, a roughly 50-fold expansion atop FrameNet v1.7. The resulting dataset is intrinsically and extrinsically evaluated in detail, showing positive results on a downstream task.
more » « less
Full Text Available
Joint Universal Syntactic and Semantic Parsing

https://doi.org/10.1162/tacl_a_00396

Stengel-Eskin, Elias; Murray, Kenton; Zhang, Sheng; Steven White, Aaron; Van Durme, Benjamin (January 2021, Transactions of the Association for Computational Linguistics)
null (Ed.)
Full Text Available

Search for: All records